An Improved Multiple Faults Reassignment based Recovery in Cluster Computing

نویسندگان

  • Sanjay Bansal
  • Sanjeev Sharma
چکیده

In case of multiple node failures performance becomes very low as compare to single node failure. Failures of nodes in cluster computing can be tolerated by multiple fault tolerant computing. Existing recovery schemes are efficient for single fault but not with multiple faults. Recovery scheme proposed in this paper having two phases; sequentially phase, concurrent phase. In sequentially phase, loads of all working nodes are uniformly and evenly distributed by proposed dynamic rank based and load distribution algorithm. In concurrent phase, loads of all failure nodes as well as new job arrival are assigned equally to all available nodes by just finding the least loaded node among the several nodes by failure nodes job allocation algorithm. Sequential and concurrent executions of algorithms improve the performance as well better resource utilization. Dynamic rank based algorithm for load redistribution works as a sequential restoration algorithm and reassignment algorithm for distribution of failure nodes to least loaded computing nodes works as a concurrent recovery reassignment algorithm. Since load is evenly and uniformly distributed among all available working nodes with less number of iterations, low iterative time and communication overheads hence performance is improved. Dynamic ranking algorithm is low overhead, high convergence algorithm for reassignment of tasks uniformly among all available nodes. Reassignments of failure nodes are done by a low overhead efficient failure job allocation algorithm. Test results to show effectiveness of the proposed scheme are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Opportunity Cost Approach for Job Assignment and Reassignment in a Scalable Computing Cluster

A new method is presented for job assignment to and reassignment between machines in a computing cluster. Our method is based on a theoretical framework that has been experimentally tested and shown to be useful in practice. This “opportunity cost” method converts the usage of several heterogeneous resources in a machine to a single homogeneous “cost.” Assignment and reassignment is then perfor...

متن کامل

Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...

متن کامل

An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster

ÐA new method is presented for job assignment to and reassignment between machines in a computing cluster. Our method is based on a theoretical framework that has been experimentally tested and shown to be useful in practice. This aopportunity costo method converts the usage of several heterogeneous resources in a machine to a single homogeneous acost.o Assignment and reassignment are then perf...

متن کامل

Detection of Single and Dual Incipient Process Faults Using an Improved Artificial Neural Network

Changes in the physicochemical conditions of process unit, even under control, may lead to what are generically referred to as faults. The cognition of causes is very important, because the system can be diagnosed and fault tolerated. In this article, we discuss and propose an artificial neural network that can detect the incipient and gradual faults either individually or mutually. The mai...

متن کامل

Tolerance to Multiple Transient Faults for Aperiodic Tasks inHard Real - Time

Real-time systems are being increasingly used in several applications which are time-critical in nature. Fault tolerance is an essential requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that guarantees the timely recovery from multiple faults within hard real-time constraints in uniprocessor systems. Assuming earliest-d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1102.2616  شماره 

صفحات  -

تاریخ انتشار 2010